2,928 research outputs found

    Clades and clans: a comparison study of two evolutionary models

    Get PDF
    The Yule-Harding-Kingman (YHK) model and the proportional to distinguishable arrangements (PDA) model are two binary tree generating models that are widely used in evolutionary biology. Understanding the distributions of clade sizes under these two models provides valuable insights into macro-evolutionary processes, and is important in hypothesis testing and Bayesian analyses in phylogenetics. Here we show that these distributions are log-convex, which implies that very large clades or very small clades are more likely to occur under these two models. Moreover, we prove that there exists a critical value κ(n)\kappa(n) for each n4n\geqslant 4 such that for a given clade with size kk, the probability that this clade is contained in a random tree with nn leaves generated under the YHK model is higher than that under the PDA model if 1<k<κ(n)1<k<\kappa(n), and lower if κ(n)<k<n\kappa(n)<k<n. Finally, we extend our results to binary unrooted trees, and obtain similar results for the distributions of clan sizes.Comment: 21page

    SPSmart: adapting population based SNP genotype databases for fast and comprehensive web access

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>In the last five years large online resources of human variability have appeared, notably HapMap, Perlegen and the CEPH foundation. These databases of genotypes with population information act as catalogues of human diversity, and are widely used as reference sources for population genetics studies. Although many useful conclusions may be extracted by querying databases individually, the lack of flexibility for combining data from within and between each database does not allow the calculation of key population variability statistics.</p> <p>Results</p> <p>We have developed a novel tool for accessing and combining large-scale genomic databases of single nucleotide polymorphisms (SNPs) in widespread use in human population genetics: SPSmart (SNPs for Population Studies). A fast pipeline creates and maintains a data mart from the most commonly accessed databases of genotypes containing population information: data is mined, summarized into the standard statistical reference indices, and stored into a relational database that currently handles as many as 4 × 10<sup>9 </sup>genotypes and that can be easily extended to new database initiatives. We have also built a web interface to the data mart that allows the browsing of underlying data indexed by population and the combining of populations, allowing intuitive and straightforward comparison of population groups. All the information served is optimized for web display, and most of the computations are already pre-processed in the data mart to speed up the data browsing and any computational treatment requested.</p> <p>Conclusion</p> <p>In practice, SPSmart allows populations to be combined into user-defined groups, while multiple databases can be accessed and compared in a few simple steps from a single query. It performs the queries rapidly and gives straightforward graphical summaries of SNP population variability through visual inspection of allele frequencies outlined in standard pie-chart format. In addition, full numerical description of the data is output in statistical results panels that include common population genetics metrics such as heterozygosity, <it>Fst </it>and <it>In</it>.</p

    Genomic microsatellites identify shared Jewish ancestry intermediate between Middle Eastern and European populations

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Genetic studies have often produced conflicting results on the question of whether distant Jewish populations in different geographic locations share greater genetic similarity to each other or instead, to nearby non-Jewish populations. We perform a genome-wide population-genetic study of Jewish populations, analyzing 678 autosomal microsatellite loci in 78 individuals from four Jewish groups together with similar data on 321 individuals from 12 non-Jewish Middle Eastern and European populations.</p> <p>Results</p> <p>We find that the Jewish populations show a high level of genetic similarity to each other, clustering together in several types of analysis of population structure. Further, Bayesian clustering, neighbor-joining trees, and multidimensional scaling place the Jewish populations as intermediate between the non-Jewish Middle Eastern and European populations.</p> <p>Conclusion</p> <p>These results support the view that the Jewish populations largely share a common Middle Eastern ancestry and that over their history they have undergone varying degrees of admixture with non-Jewish populations of European descent.</p

    Straightforward Inference of Ancestry and Admixture Proportions through Ancestry-Informative Insertion Deletion Multiplexing

    Get PDF
    Ancestry-informative markers (AIMs) show high allele frequency divergence between different ancestral or geographically distant populations. These genetic markers are especially useful in inferring the likely ancestral origin of an individual or estimating the apportionment of ancestry components in admixed individuals or populations. The study of AIMs is of great interest in clinical genetics research, particularly to detect and correct for population substructure effects in case-control association studies, but also in population and forensic genetics studies

    Effective selection of informative SNPs and classification on the HapMap genotype data

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Since the single nucleotide polymorphisms (SNPs) are genetic variations which determine the difference between any two unrelated individuals, the SNPs can be used to identify the correct source population of an individual. For efficient population identification with the HapMap genotype data, as few informative SNPs as possible are required from the original 4 million SNPs. Recently, Park <it>et al.</it> (2006) adopted the nearest shrunken centroid method to classify the three populations, i.e., Utah residents with ancestry from Northern and Western Europe (CEU), Yoruba in Ibadan, Nigeria in West Africa (YRI), and Han Chinese in Beijing together with Japanese in Tokyo (CHB+JPT), from which 100,736 SNPs were obtained and the top 82 SNPs could completely classify the three populations.</p> <p>Results</p> <p>In this paper, we propose to first rank each feature (SNP) using a ranking measure, i.e., a modified t-test or F-statistics. Then from the ranking list, we form different feature subsets by sequentially choosing different numbers of features (e.g., 1, 2, 3, ..., 100.) with top ranking values, train and test them by a classifier, e.g., the support vector machine (SVM), thereby finding one subset which has the highest classification accuracy. Compared to the classification method of Park <it>et al.</it>, we obtain a better result, i.e., good classification of the 3 populations using on average 64 SNPs.</p> <p>Conclusion</p> <p>Experimental results show that the both of the modified t-test and F-statistics method are very effective in ranking SNPs about their classification capabilities. Combined with the SVM classifier, a desirable feature subset (with the minimum size and most informativeness) can be quickly found in the greedy manner after ranking all SNPs. Our method is able to identify a very small number of important SNPs that can determine the populations of individuals.</p

    Heterozygosity increases microsatellite mutation rate, linking it to demographic history

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Biochemical experiments in yeast suggest a possible mechanism that would cause heterozygous sites to mutate faster than equivalent homozygous sites. If such a process operates, it could undermine a key assumption at the core of population genetic theory, namely that mutation rate and population size are indpendent, because population expansion would increase heterozygosity that in turn would increase mutation rate. Here we test this hypothesis using both direct counting of microsatellite mutations in human pedigrees and an analysis of the relationship between microsatellite length and patterns of demographically-induced variation in heterozygosity.</p> <p>Results</p> <p>We find that microsatellite alleles of any given length are more likely to mutate when their homologue is unusually different in length. Furthermore, microsatellite lengths in human populations do not vary randomly, but instead exhibit highly predictable trends with both distance from Africa, a surrogate measure of genome-wide heterozygosity, and modern population size. This predictability remains even after statistically controlling for non-independence due to shared ancestry among populations.</p> <p>Conclusion</p> <p>Our results reveal patterns that are unexpected under classical population genetic theory, where no mechanism exists capable of linking allele length to extrinsic variables such as geography or population size. However, the predictability of microsatellite length is consistent with heterozygote instability and suggest that this has an important impact on microsatellite evolution. Whether similar processes impact on single nucleotide polymorphisms remains unclear.</p

    Geographical Affinities of the HapMap Samples

    Get PDF
    The HapMap samples were collected for medical-genetic studies, but are also widely used in population-genetic and evolutionary investigations. Yet the ascertainment of the samples differs from most population-genetic studies which collect individuals who live in the same local region as their ancestors. What effects could this non-standard ascertainment have on the interpretation of HapMap results?We compared the HapMap samples with more conventionally-ascertained samples used in population- and forensic-genetic studies, including the HGDP-CEPH panel, making use of published genome-wide autosomal SNP data and Y-STR haplotypes, as well as producing new Y-STR data. We found that the HapMap samples were representative of their broad geographical regions of ancestry according to all tests applied. The YRI and JPT were indistinguishable from independent samples of Yoruba and Japanese in all ways investigated. However, both the CHB and the CEU were distinguishable from all other HGDP-CEPH populations with autosomal markers, and both showed Y-STR similarities to unusually large numbers of populations, perhaps reflecting their admixed origins.The CHB and JPT are readily distinguished from one another with both autosomal and Y-chromosomal markers, and results obtained after combining them into a single sample should be interpreted with caution. The CEU are better described as being of Western European ancestry than of Northern European ancestry as often reported. Both the CHB and CEU show subtle but detectable signs of admixture. Thus the YRI and JPT samples are well-suited to standard population-genetic studies, but the CHB and CEU less so

    Positive Selection in East Asians for an EDAR Allele that Enhances NF-κB Activation

    Get PDF
    Genome-wide scans for positive selection in humans provide a promising approach to establish links between genetic variants and adaptive phenotypes. From this approach, lists of hundreds of candidate genomic regions for positive selection have been assembled. These candidate regions are expected to contain variants that contribute to adaptive phenotypes, but few of these regions have been associated with phenotypic effects. Here we present evidence that a derived nonsynonymous substitution (370A) in EDAR, a gene involved in ectodermal development, was driven to high frequency in East Asia by positive selection prior to 10,000 years ago. With an in vitro transfection assay, we demonstrate that 370A enhances NF-κB activity. Our results suggest that 370A is a positively selected functional genetic variant that underlies an adaptive human phenotype
    corecore